logical function
Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a `reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
Xu, Jun, Du, Xinkai, Ao, Yu, Zhao, Peilong, Li, Yang, Zhong, Ling, Yuan, Lin, Bo, Zhongpu, Wang, Xiaorui, Sun, Mengshu, Gui, Zhengke, Zhang, Dalong, Wang, Zhaoyang, Wang, Qiwei, Hou, Yangyang, Yin, Zhiying, Wang, Haofen, Chen, Huajun, Liang, Lei, Zhou, Jun
Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes. The source code is available at https://github.com/OpenSPG/KAG-Thinker.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (15 more...)
- Research Report (0.81)
- Workflow (0.68)
- Leisure & Entertainment (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Media > Film (0.68)
Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers.
Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers.
Logifold: A Geometrical Foundation of Ensemble Machine Learning
Abstract--We present a local-to-global and measure-theoretical approach to understanding datasets. The core idea is to form ulate a logifold structure and to interpret network models with restricted domains as local charts of datasets. In particul ar, this provides a mathematical foundation for ensemble machi ne learning. Our experiments demonstrate that logifolds can b e implemented to identify fuzzy domains and improve accuracy compared to taking average of model outputs. Additionally, we provide a theoretical example of a logifold, highlighting t he importance of restricting to domains of classifiers in an ens emble.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (2 more...)
A logifold structure on measure space
In this paper, we develop a local-to-global and measure-theoretical approach to understand datasets. The idea is to take network models with restricted domains as local charts of datasets. We develop the mathematical foundations for these structures, and show in experiments how it can be used to find fuzzy domains and to improve accuracy in data classification problems.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > South Carolina > Richland County > Columbia (0.04)
- North America > United States > Pennsylvania (0.04)
- (5 more...)
Universal Mechanical Polycomputation in Granular Matter
Parsa, Atoosa, Witthaus, Sven, Pashine, Nidhi, O'Hern, Corey S., Kramer-Bottiglio, Rebecca, Bongard, Josh
Unconventional computing devices are increasingly of interest as they can operate in environments hostile to silicon-based electronics, or compute in ways that traditional electronics cannot. Mechanical computers, wherein information processing is a material property emerging from the interaction of components with the environment, are one such class of devices. This information processing can be manifested in various physical substrates, one of which is granular matter. In a granular assembly, vibration can be treated as the information-bearing mode. This can be exploited to realize "polycomputing": materials can be evolved such that a single grain within them can report the result of multiple logical operations simultaneously at different frequencies, without recourse to quantum effects. Here, we demonstrate the evolution of a material in which one grain acts simultaneously as two different NAND gates at two different frequencies. NAND gates are of interest as any logical operations can be built from them. Moreover, they are nonlinear thus demonstrating a step toward general-purpose, computationally dense mechanical computers. Polycomputation was found to be distributed across each evolved material, suggesting the material's robustness. With recent advances in material sciences, hardware realization of these materials may eventually provide devices that challenge the computational density of traditional computers.
- Europe > Portugal > Lisbon > Lisbon (0.06)
- North America > United States > Connecticut > New Haven County > New Haven (0.05)
- North America > United States > Vermont > Chittenden County > Burlington (0.04)
- (2 more...)
Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning
Han, Chi, He, Qizheng, Yu, Charles, Du, Xinya, Tong, Hanghang, Ji, Heng
Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables $x$, $y$ and $z$. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. It is an interpretable representation while allowing for differentiable optimization. We can then incorporate LERP into probabilistic logical rule learning to learn more expressive rules. Empirical results demonstrate that with LERP, our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods. Moreover, we find that our model can discover a more expressive family of logical rules. LERP can also be further combined with embedding learning methods like TransE to make it more interpretable.
- North America > United States > Texas (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Association Learning (1.00)
- (2 more...)
An Evolutionary-Based Approach to Learning Multiple Decision Models from Underrepresented Data
Schetinin, Vitaly, Li, Dayou, Maple, Carsten
The use of multiple Decision Models (DMs) enables to enhance the accuracy in decisions and at the same time allows users to evaluate the confidence in decision making. In this paper we explore the ability of multiple DMs to learn from a small amount of verified data. This becomes important when data samples are difficult to collect and verify. We propose an evolutionary-based approach to solving this problem. The proposed technique is examined on a few clinical problems presented by a small amount of data.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- North America > United States > Wisconsin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
Explorations with the Dynamic Wave Model
Rebotier, Thomas P., Elman, Jeffrey L.
Following Shrager and Johnson (1995) we study growth of logical function complexity in a network swept by two overlapping waves: one of pruning, and the other of Hebbian reinforcement of connections. Results indicate a significant spatial gradient in the appearance of both linearly separable and non linearly separable functions of the two inputs of the network; the n.l.s. cells are much sparser and their slope of appearance is sensitive to parameters in a highly nonlinear way.